Explaining GPT Technology

How I expect to be using AI Large Language Models in the future.

Published

last-modified

Jun 2024 summary from Ethan Mollick:

the Jargon-Free Guide and this more technical (but remarkably clear) video, but the classic work by Wolfram is also good.

The Verge (Jul 2024) has an easy-to-understand summary of important AI Terminology.

Sample Exchange - Chinese Philosophy

LLMs for Dummies is a one-page summary with simpler language

For more, see this explainer from Ars Technica (Timothy B. Lee and Sean Trott) and this explainer from the FT (Madhumita Murgia).

Also see how Sarah Tavel frames pricing for AI tools as “work done” vs. per seat pricing as we’ve seen in traditional SaaS

OpenAI engineers wrote a short How does ChatGPT work. It’s an easy read, with diagrams, that summarizes the process of tokenization, embedding, multiplication of model weights, and sampling a prediction.

General Overview

Jonathan Shriftman’s The Building Blocks of Generative AI: A Beginners Guide to The Generative AI Infrastructure Stack is a lengthy overview of the various pieces of an end-to-end GPT-based system.

Image Generation

A very easy-to-follow description of how image-generation apps, like Dall-E or Midjourney, work: How AI Image Models Work

A noisy image can be sharpened when you “fill in the blanks”. The same principle can generate a brand new image from one that’s nothing but noise.

Explaining GPT Technology

Technical Terms

RNN: see Andrej Karpathy’s The Unreasonable Effectiveness of Recurrent Neural Networks a step-by-step description of RNNs (from 2015).

LoRA : Low Rank Adaptation

Microsoft Edward Hu:

freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency

Recent Developments Explained

Extremely lengthy rundown of recent LLM developments at Don’t Worry About the Vase

See especially #12: What GPT-4 Can Do and #14: What GPT-4 Can’t Do

What AI Can’t Do

What’s Next

Here’s a 72-page paper¹ summarizing Challenges and Applications of Large Language Models

Kaddour et al. (2023)

and see Technology Review Mar 2024

In 2016, Chiyuan Zhang at MIT and colleagues at Google Brain published an influential paper titled “Understanding Deep Learning Requires Rethinking Generalization.” In 2021, five years later, they republished the paper, calling it “Understanding Deep Learning (Still) Requires Rethinking Generalization.” What about in 2024? “Kind of yes and no,” says Zhang. “There has been a lot of progress lately, though probably many more questions arise than get resolved.”

References

Kaddour, Jean, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. “Challenges and Applications of Large Language Models.” https://doi.org/10.48550/ARXIV.2307.10169.

Footnotes

via LinkedIn ↩︎